AWS Bedrock Knowledge Base

AWS Bedrock API Guide

Table of Contents


Overview

Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies through a unified API. This guide covers how to use the Bedrock APIs and configure inference parameters.

Prerequisites

Authentication

import boto3

# Create a Bedrock client
bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1'
)

Available Models

Bedrock provides access to multiple foundation models:

Basic API Usage

Invoke Model (Synchronous)

import json

model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

prompt = "What is machine learning?"

# Format request body based on model
request_body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": prompt
        }
    ]
}

response = bedrock.invoke_model(
    modelId=model_id,
    body=json.dumps(request_body)
)

response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])

Invoke Model with Response Stream

Streaming allows you to receive model responses incrementally as they're generated, rather than waiting for the complete response. This is crucial for: - Better user experience - Users see output immediately - Long responses - Start processing before completion - Real-time applications - Chat interfaces, live content generation - Reduced perceived latency - Feels faster even if total time is similar

import json

response = bedrock.invoke_model_with_response_stream(
    modelId=model_id,
    body=json.dumps(request_body)
)

# Process the stream
stream = response.get('body')
if stream:
    for event in stream:
        chunk = event.get('chunk')
        if chunk:
            chunk_data = json.loads(chunk.get('bytes').decode())

            # For Claude models
            if chunk_data.get('type') == 'content_block_delta':
                text = chunk_data.get('delta', {}).get('text', '')
                print(text, end='', flush=True)

            # Check for completion
            if chunk_data.get('type') == 'message_stop':
                print("\n[Stream completed]")

Advanced Streaming Example with Error Handling:

def stream_bedrock_response(bedrock_client, model_id, request_body):
    """
    Stream response from Bedrock with proper error handling
    """
    try:
        response = bedrock_client.invoke_model_with_response_stream(
            modelId=model_id,
            body=json.dumps(request_body)
        )

        full_response = ""
        stream = response.get('body')

        for event in stream:
            chunk = event.get('chunk')
            if chunk:
                chunk_data = json.loads(chunk.get('bytes').decode())

                # Handle different event types
                if chunk_data.get('type') == 'message_start':
                    print("[Streaming started]")

                elif chunk_data.get('type') == 'content_block_start':
                    print("[Content block started]")

                elif chunk_data.get('type') == 'content_block_delta':
                    delta = chunk_data.get('delta', {})
                    if delta.get('type') == 'text_delta':
                        text = delta.get('text', '')
                        full_response += text
                        print(text, end='', flush=True)

                elif chunk_data.get('type') == 'content_block_stop':
                    print("\n[Content block completed]")

                elif chunk_data.get('type') == 'message_delta':
                    # Contains usage statistics
                    usage = chunk_data.get('usage', {})
                    print(f"\n[Output tokens: {usage.get('output_tokens', 0)}]")

                elif chunk_data.get('type') == 'message_stop':
                    print("[Stream completed]")
                    break

        return full_response

    except Exception as e:
        print(f"Streaming error: {e}")
        raise

# Usage
full_text = stream_bedrock_response(bedrock, model_id, request_body)

Streaming with Titan Models:

response = bedrock.invoke_model_with_response_stream(
    modelId="amazon.titan-text-express-v1",
    body=json.dumps({
        "inputText": "Write a story about AI",
        "textGenerationConfig": {
            "maxTokenCount": 512,
            "temperature": 0.7
        }
    })
)

for event in response.get('body'):
    chunk = json.loads(event['chunk']['bytes'])
    if 'outputText' in chunk:
        print(chunk['outputText'], end='', flush=True)

Message Structure and Roles

Understanding how to structure messages is fundamental to working with Bedrock models effectively. Messages define the conversation flow and context.

Understanding Message Roles

Every message in a conversation has a role that identifies who is speaking. There are three primary roles:

1. User Role

{
    "role": "user",
    "content": [
        {"text": "What is machine learning?"}
    ]
}

2. Assistant Role

{
    "role": "assistant",
    "content": [
        {"text": "Machine learning is a subset of artificial intelligence..."}
    ]
}

3. System Role (Special)

# System prompt is separate from messages
system = [
    {"text": "You are a helpful AI assistant specializing in Python programming."}
]

messages = [
    {"role": "user", "content": [{"text": "How do I use decorators?"}]}
]

Message Content Structure

Messages use a structured content format that supports different content types:

Text Content (Most Common)

message = {
    "role": "user",
    "content": [
        {
            "text": "Explain quantum computing"
        }
    ]
}

Multiple Content Blocks

You can include multiple content blocks in a single message:

message = {
    "role": "user",
    "content": [
        {"text": "Here's my code:"},
        {"text": "def hello():\n    print('Hello')"},
        {"text": "What does it do?"}
    ]
}

Image Content (Vision Models)

For models that support vision (like Claude 3):

import base64

# Read and encode image
with open("image.jpg", "rb") as f:
    image_bytes = f.read()
    image_base64 = base64.b64encode(image_bytes).decode('utf-8')

message = {
    "role": "user",
    "content": [
        {
            "image": {
                "format": "jpeg",  # or "png", "gif", "webp"
                "source": {
                    "bytes": image_base64
                }
            }
        },
        {"text": "What's in this image?"}
    ]
}

Building Conversations

Conversations are built by alternating between user and assistant messages:

# Single turn conversation
messages = [
    {
        "role": "user",
        "content": [{"text": "What is Python?"}]
    }
]

# Multi-turn conversation
messages = [
    # Turn 1
    {
        "role": "user",
        "content": [{"text": "What is Python?"}]
    },
    {
        "role": "assistant",
        "content": [{"text": "Python is a high-level programming language..."}]
    },
    # Turn 2
    {
        "role": "user",
        "content": [{"text": "What are its main features?"}]
    },
    {
        "role": "assistant",
        "content": [{"text": "Python's main features include..."}]
    },
    # Turn 3 (current)
    {
        "role": "user",
        "content": [{"text": "Show me an example"}]
    }
]

Message Validation Rules

Bedrock enforces specific rules for message structure:

Rule 1: Alternating Roles

Messages must alternate between user and assistant:

# ✅ VALID
messages = [
    {"role": "user", "content": [{"text": "Hello"}]},
    {"role": "assistant", "content": [{"text": "Hi there!"}]},
    {"role": "user", "content": [{"text": "How are you?"}]}
]

# ❌ INVALID - Two user messages in a row
messages = [
    {"role": "user", "content": [{"text": "Hello"}]},
    {"role": "user", "content": [{"text": "Are you there?"}]}
]

Solution: Combine multiple user inputs into one message:

# ✅ VALID - Combined into single user message
messages = [
    {
        "role": "user",
        "content": [
            {"text": "Hello"},
            {"text": "Are you there?"}
        ]
    }
]

Rule 2: Start with User

Conversations must always start with a user message:

# ✅ VALID
messages = [
    {"role": "user", "content": [{"text": "Hello"}]}
]

# ❌ INVALID - Starts with assistant
messages = [
    {"role": "assistant", "content": [{"text": "Hello"}]}
]

Rule 3: End with User

The last message must be from the user (the one you want a response to):

# ✅ VALID - Ends with user message
messages = [
    {"role": "user", "content": [{"text": "What is AI?"}]},
    {"role": "assistant", "content": [{"text": "AI is..."}]},
    {"role": "user", "content": [{"text": "Tell me more"}]}
]

# ❌ INVALID - Ends with assistant
messages = [
    {"role": "user", "content": [{"text": "What is AI?"}]},
    {"role": "assistant", "content": [{"text": "AI is..."}]}
]
# This would work, but you wouldn't get a new response

Rule 4: Content Must Not Be Empty

Every message must have at least one content block:

# ✅ VALID
{"role": "user", "content": [{"text": "Hello"}]}

# ❌ INVALID - Empty content
{"role": "user", "content": []}

Practical Message Management

Here's a helper class for managing message structure:

class MessageBuilder:
    """Helper class for building valid Bedrock message structures"""

    def __init__(self):
        self.messages = []

    def add_user_message(self, text: str):
        """Add a user message"""
        self.messages.append({
            "role": "user",
            "content": [{"text": text}]
        })
        return self

    def add_assistant_message(self, text: str):
        """Add an assistant message"""
        self.messages.append({
            "role": "assistant",
            "content": [{"text": text}]
        })
        return self

    def add_user_message_with_image(self, text: str, image_base64: str, image_format: str = "jpeg"):
        """Add a user message with an image"""
        self.messages.append({
            "role": "user",
            "content": [
                {
                    "image": {
                        "format": image_format,
                        "source": {"bytes": image_base64}
                    }
                },
                {"text": text}
            ]
        })
        return self

    def validate(self) -> bool:
        """Validate message structure"""
        if not self.messages:
            return False

        # Must start with user
        if self.messages[0]["role"] != "user":
            return False

        # Must end with user
        if self.messages[-1]["role"] != "user":
            return False

        # Check alternating roles
        for i in range(len(self.messages) - 1):
            current_role = self.messages[i]["role"]
            next_role = self.messages[i + 1]["role"]
            if current_role == next_role:
                return False

        return True

    def get_messages(self):
        """Get the messages list"""
        if not self.validate():
            raise ValueError("Invalid message structure")
        return self.messages

    def clear(self):
        """Clear all messages"""
        self.messages = []
        return self

# Usage
builder = MessageBuilder()
builder.add_user_message("What is Python?")
builder.add_assistant_message("Python is a programming language...")
builder.add_user_message("Show me an example")

messages = builder.get_messages()

System Prompts: Controlling Model Behavior

System prompts are one of the most powerful tools for controlling how AI models behave. They set the context, personality, and constraints for the entire conversation.

What Are System Prompts?

A system prompt is a special instruction given to the model before any user messages. It defines:

Think of it as the model's "job description" for the conversation.

How System Prompts Work

System prompts are processed differently from regular messages:

# Traditional approach (InvokeModel with Claude)
request_body = {
    "anthropic_version": "bedrock-2023-05-31",
    "system": "You are a helpful Python programming assistant.",  # System prompt
    "messages": [
        {"role": "user", "content": "How do I use decorators?"}
    ],
    "max_tokens": 1024
}

# Converse API approach
response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    system=[
        {"text": "You are a helpful Python programming assistant."}
    ],
    messages=[
        {"role": "user", "content": [{"text": "How do I use decorators?"}]}
    ]
)

Key characteristics: - System prompts are always processed first - They apply to the entire conversation - They don't count as a conversation turn - They strongly influence model behavior

System Prompt Best Practices

1. Be Specific and Clear

# ❌ Vague
system_prompt = "Be helpful."

# ✅ Specific
system_prompt = """You are a Python programming tutor. 
When explaining concepts:
- Use simple language suitable for beginners
- Provide code examples for every concept
- Explain what each line of code does
- Suggest exercises for practice"""

2. Define the Role/Persona

# Customer service bot
system_prompt = """You are a friendly customer service representative for TechCorp.
- Always greet customers warmly
- Be patient and empathetic
- Provide clear, step-by-step solutions
- If you can't help, offer to escalate to a human agent
- Never make promises about refunds or replacements without checking policies"""

3. Set Boundaries and Constraints

system_prompt = """You are a medical information assistant.

What you CAN do:
- Provide general health information
- Explain medical terms
- Suggest when to see a doctor

What you CANNOT do:
- Diagnose conditions
- Prescribe medications
- Replace professional medical advice

Always remind users to consult healthcare professionals for personal medical advice."""

4. Specify Output Format

system_prompt = """You are a code review assistant.

For each code review, provide:
1. Overall assessment (Good/Needs Improvement/Poor)
2. Strengths (bullet points)
3. Issues found (bullet points with severity: Critical/Major/Minor)
4. Specific recommendations
5. Refactored code example (if needed)

Use markdown formatting for clarity."""

System Prompt Examples by Use Case

Code Assistant

system_prompt = """You are an expert software engineer specializing in Python, JavaScript, and system design.

Guidelines:
- Write clean, idiomatic code following best practices
- Include error handling and edge cases
- Add clear comments explaining complex logic
- Suggest performance optimizations when relevant
- Consider security implications
- Provide type hints for Python code
- Follow PEP 8 style guide for Python
- Use ES6+ features for JavaScript

When reviewing code:
- Point out bugs and potential issues
- Suggest improvements for readability and maintainability
- Explain the reasoning behind your suggestions"""

Content Writer

system_prompt = """You are a professional content writer specializing in technical blog posts.

Writing style:
- Clear and engaging tone
- Use active voice
- Short paragraphs (3-4 sentences max)
- Include relevant examples
- Add subheadings for structure
- Use bullet points for lists
- Write for a technical but not expert audience

Structure:
1. Compelling introduction with a hook
2. Main content with clear sections
3. Practical examples or code snippets
4. Key takeaways or conclusion
5. Call to action

Avoid:
- Jargon without explanation
- Overly long sentences
- Passive voice
- Fluff or filler content"""

Data Analyst

system_prompt = """You are a data analyst expert helping users understand and analyze data.

When analyzing data:
1. Start with summary statistics
2. Identify patterns and trends
3. Point out anomalies or outliers
4. Suggest relevant visualizations
5. Provide actionable insights

When writing code:
- Use pandas for data manipulation
- Use matplotlib/seaborn for visualization
- Include comments explaining each step
- Handle missing data appropriately
- Validate assumptions

Always explain your analytical approach and reasoning."""

Educational Tutor

system_prompt = """You are a patient and encouraging tutor for high school mathematics.

Teaching approach:
- Break down complex problems into simple steps
- Use analogies and real-world examples
- Check understanding before moving forward
- Encourage students when they struggle
- Celebrate correct answers
- Guide students to find answers rather than giving them directly

When a student makes a mistake:
- Don't just say it's wrong
- Help them identify where they went wrong
- Guide them to the correct approach
- Reinforce the underlying concept

Use encouraging language like:
- "Great start! Let's think about..."
- "You're on the right track..."
- "That's a common mistake, let's see why..."
"""

API Documentation Helper

system_prompt = """You are an API documentation expert helping developers understand and use APIs.

When explaining APIs:
1. Provide endpoint URL and HTTP method
2. List all parameters (required vs optional)
3. Show request example with sample data
4. Show response example with explanation
5. List possible error codes and meanings
6. Include authentication requirements
7. Provide code examples in multiple languages (Python, JavaScript, cURL)

Format responses as:
- Clear section headers
- Code blocks with syntax highlighting
- Tables for parameters
- Warning boxes for important notes

Always include working, copy-paste ready examples."""

System Prompt Management Strategies

Strategy 1: Template-Based System Prompts

Create reusable templates with placeholders:

class SystemPromptTemplates:
    """Reusable system prompt templates"""

    CUSTOMER_SERVICE = """You are a {tone} customer service representative for {company_name}.

Product knowledge:
{product_info}

Policies:
{policies}

Response guidelines:
- Always greet customers by name if provided
- Be {tone} and professional
- Provide solutions within {response_time}
- Escalate to human if: {escalation_criteria}"""

    CODE_REVIEWER = """You are a {language} code reviewer with {experience_level} expertise.

Focus areas:
{focus_areas}

Standards to enforce:
{coding_standards}

Severity levels:
- Critical: {critical_definition}
- Major: {major_definition}  
- Minor: {minor_definition}"""

    @staticmethod
    def create_customer_service_prompt(company_name, tone="friendly", **kwargs):
        return SystemPromptTemplates.CUSTOMER_SERVICE.format(
            company_name=company_name,
            tone=tone,
            product_info=kwargs.get('product_info', 'General products'),
            policies=kwargs.get('policies', 'Standard policies'),
            response_time=kwargs.get('response_time', '24 hours'),
            escalation_criteria=kwargs.get('escalation_criteria', 'Complex issues')
        )

# Usage
prompt = SystemPromptTemplates.create_customer_service_prompt(
    company_name="TechCorp",
    tone="professional and empathetic",
    product_info="Cloud hosting services",
    policies="30-day money-back guarantee, 24/7 support"
)

Strategy 2: Layered System Prompts

Build complex prompts from modular components:

class SystemPromptBuilder:
    """Build system prompts from modular components"""

    def __init__(self):
        self.components = []

    def add_role(self, role: str):
        """Define the model's role"""
        self.components.append(f"You are {role}.")
        return self

    def add_expertise(self, areas: list):
        """Define areas of expertise"""
        expertise = "Your areas of expertise include:\n" + "\n".join(f"- {area}" for area in areas)
        self.components.append(expertise)
        return self

    def add_guidelines(self, guidelines: list):
        """Add behavioral guidelines"""
        guide_text = "Guidelines:\n" + "\n".join(f"- {g}" for g in guidelines)
        self.components.append(guide_text)
        return self

    def add_constraints(self, constraints: list):
        """Add constraints/limitations"""
        constraint_text = "Constraints:\n" + "\n".join(f"- {c}" for c in constraints)
        self.components.append(constraint_text)
        return self

    def add_output_format(self, format_description: str):
        """Specify output format"""
        self.components.append(f"Output format:\n{format_description}")
        return self

    def add_examples(self, examples: list):
        """Add example interactions"""
        example_text = "Examples:\n" + "\n\n".join(examples)
        self.components.append(example_text)
        return self

    def build(self) -> str:
        """Build the final system prompt"""
        return "\n\n".join(self.components)

# Usage
prompt = (SystemPromptBuilder()
    .add_role("an expert Python developer and teacher")
    .add_expertise([
        "Python programming (beginner to advanced)",
        "Web development with Django and Flask",
        "Data science with pandas and numpy",
        "Best practices and design patterns"
    ])
    .add_guidelines([
        "Explain concepts clearly with examples",
        "Write clean, well-commented code",
        "Consider edge cases and error handling",
        "Suggest best practices and optimizations"
    ])
    .add_constraints([
        "Only provide Python 3.8+ compatible code",
        "Avoid deprecated features",
        "Don't use external libraries unless necessary"
    ])
    .add_output_format("""
1. Brief explanation of the concept
2. Code example with comments
3. Expected output
4. Common pitfalls to avoid
    """)
    .build()
)

print(prompt)

Strategy 3: Dynamic System Prompts

Adjust system prompts based on context:

class DynamicSystemPrompts:
    """Generate context-aware system prompts"""

    @staticmethod
    def for_user_level(user_level: str, domain: str):
        """Generate prompt based on user expertise level"""

        level_configs = {
            "beginner": {
                "tone": "patient and encouraging",
                "detail": "Explain every step in detail",
                "examples": "Use simple, relatable examples",
                "jargon": "Avoid technical jargon or explain it clearly"
            },
            "intermediate": {
                "tone": "professional and informative",
                "detail": "Provide clear explanations with some technical depth",
                "examples": "Use practical, real-world examples",
                "jargon": "Use technical terms but explain complex ones"
            },
            "advanced": {
                "tone": "technical and precise",
                "detail": "Focus on advanced concepts and edge cases",
                "examples": "Use sophisticated examples and best practices",
                "jargon": "Use technical terminology freely"
            }
        }

        config = level_configs.get(user_level, level_configs["intermediate"])

        return f"""You are a {config['tone']} {domain} expert.

Communication style:
- {config['detail']}
- {config['examples']}
- {config['jargon']}

Adjust your responses to match the {user_level} level of expertise."""

    @staticmethod
    def for_task_type(task_type: str):
        """Generate prompt based on task type"""

        task_prompts = {
            "debug": """You are a debugging expert.

Approach:
1. Analyze the error message carefully
2. Identify the root cause
3. Explain why the error occurred
4. Provide the fix with explanation
5. Suggest how to prevent similar issues

Be systematic and thorough.""",

            "optimize": """You are a performance optimization expert.

Approach:
1. Analyze current implementation
2. Identify bottlenecks
3. Suggest optimizations with trade-offs
4. Provide benchmarking approach
5. Consider scalability

Focus on measurable improvements.""",

            "design": """You are a software architecture expert.

Approach:
1. Understand requirements thoroughly
2. Consider scalability and maintainability
3. Suggest design patterns where appropriate
4. Discuss trade-offs of different approaches
5. Provide clear diagrams or pseudocode

Think long-term and holistically."""
        }

        return task_prompts.get(task_type, "You are a helpful assistant.")

# Usage
prompt = DynamicSystemPrompts.for_user_level("beginner", "Python programming")
# or
prompt = DynamicSystemPrompts.for_task_type("debug")

Advanced System Prompt Techniques

Technique 1: Few-Shot Examples in System Prompts

Include examples of desired behavior:

system_prompt = """You are a sentiment analysis assistant.

Analyze the sentiment of user messages and respond in this exact format:

Example 1:
User: "I love this product! It's amazing!"
Analysis: Positive (confidence: 95%)
Key emotions: joy, satisfaction
Tone: enthusiastic

Example 2:
User: "This is terrible. Worst purchase ever."
Analysis: Negative (confidence: 98%)
Key emotions: anger, disappointment
Tone: frustrated

Example 3:
User: "It's okay, I guess. Nothing special."
Analysis: Neutral (confidence: 75%)
Key emotions: indifference
Tone: lukewarm

Now analyze user messages following this exact format."""

Technique 2: Chain-of-Thought Prompting

Encourage step-by-step reasoning:

system_prompt = """You are a math problem solver.

For every problem, follow this thinking process:

1. UNDERSTAND: Restate the problem in your own words
2. PLAN: Identify what approach or formula to use
3. SOLVE: Work through the solution step-by-step
4. CHECK: Verify your answer makes sense

Show your work for each step. Think out loud.

Example:
Problem: "If a train travels 120 miles in 2 hours, what's its average speed?"

UNDERSTAND: We need to find average speed given distance and time.
PLAN: Use the formula: speed = distance / time
SOLVE: 
  - Distance = 120 miles
  - Time = 2 hours
  - Speed = 120 / 2 = 60 miles per hour
CHECK: 60 mph × 2 hours = 120 miles ✓

Answer: 60 miles per hour"""

Technique 3: Role-Playing with Constraints

Create specific personas with detailed constraints:

system_prompt = """You are Sherlock Holmes, the famous detective.

Personality traits:
- Highly observant and analytical
- Sometimes condescending but well-meaning
- Uses deductive reasoning
- References obscure knowledge
- Speaks in Victorian English style

When analyzing problems:
- Point out details others miss
- Make logical deductions
- Explain your reasoning process
- Occasionally reference past cases
- Show confidence in your conclusions

Speech patterns:
- "Elementary, my dear Watson"
- "I observe that..."
- "It is quite evident that..."
- "The facts are these..."

Stay in character at all times."""

Technique 4: Structured Output Enforcement

Force specific output structures:

system_prompt = """You are a code review bot.

You MUST respond in this exact JSON structure:

{
  "overall_score": <number 1-10>,
  "summary": "<one sentence summary>",
  "strengths": [
    "<strength 1>",
    "<strength 2>"
  ],
  "issues": [
    {
      "severity": "<critical|major|minor>",
      "line": <line number>,
      "description": "<issue description>",
      "suggestion": "<how to fix>"
    }
  ],
  "recommendations": [
    "<recommendation 1>",
    "<recommendation 2>"
  ]
}

Do not include any text outside this JSON structure.
Ensure the JSON is valid and properly formatted."""

Converse API: Unified Interface for All Models

The Converse API is a newer, standardized way to interact with Bedrock models. Instead of dealing with model-specific request formats, you use a single, consistent interface that works across all models.

Why Use Converse API?

Traditional approach (InvokeModel): - Each model has different request/response formats - You need to know Claude's format vs Titan's format vs Llama's format - Switching models requires code changes - More complex to maintain

Converse API approach: - Single, unified format for all models - Switch models by just changing the model ID - Cleaner, more maintainable code - Built-in support for multi-turn conversations - Automatic handling of system prompts and tool use

Basic Converse API Usage

# Simple conversation with any model
response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[
        {
            "role": "user",
            "content": [
                {"text": "What is machine learning?"}
            ]
        }
    ],
    inferenceConfig={
        "maxTokens": 512,
        "temperature": 0.7,
        "topP": 0.9
    }
)

# Extract the response
output_message = response['output']['message']
response_text = output_message['content'][0]['text']
print(response_text)

# Check token usage
usage = response['usage']
print(f"Input tokens: {usage['inputTokens']}")
print(f"Output tokens: {usage['outputTokens']}")
print(f"Total tokens: {usage['totalTokens']}")

Multi-Turn Conversations

The Converse API makes it easy to maintain conversation history:

# Build a conversation
conversation_history = []

# First turn
conversation_history.append({
    "role": "user",
    "content": [{"text": "What is Python?"}]
})

response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=conversation_history
)

# Add assistant's response to history
assistant_message = response['output']['message']
conversation_history.append(assistant_message)
print(f"Assistant: {assistant_message['content'][0]['text']}")

# Second turn - model remembers context
conversation_history.append({
    "role": "user",
    "content": [{"text": "What are its main features?"}]
})

response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=conversation_history
)

assistant_message = response['output']['message']
print(f"Assistant: {assistant_message['content'][0]['text']}")

System Prompts with Converse API

System prompts set the behavior and personality of the model:

response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[
        {
            "role": "user",
            "content": [{"text": "Explain recursion"}]
        }
    ],
    system=[
        {"text": "You are an expert computer science teacher. Explain concepts using simple analogies and examples."}
    ],
    inferenceConfig={
        "maxTokens": 1000,
        "temperature": 0.7
    }
)

Converse API with Streaming

Stream responses for better user experience:

def converse_stream(bedrock_client, model_id, messages, inference_config=None):
    """
    Stream responses using Converse API
    """
    response = bedrock_client.converse_stream(
        modelId=model_id,
        messages=messages,
        inferenceConfig=inference_config or {
            "maxTokens": 2048,
            "temperature": 0.7
        }
    )

    full_text = ""

    # Process the stream
    for event in response['stream']:
        # Content block delta - this contains the actual text
        if 'contentBlockDelta' in event:
            delta = event['contentBlockDelta']['delta']
            if 'text' in delta:
                text = delta['text']
                full_text += text
                print(text, end='', flush=True)

        # Metadata about the response
        elif 'metadata' in event:
            metadata = event['metadata']
            if 'usage' in metadata:
                usage = metadata['usage']
                print(f"\n\n[Tokens used: {usage['totalTokens']}]")

        # Message stop - end of response
        elif 'messageStop' in event:
            stop_reason = event['messageStop']['stopReason']
            print(f"\n[Stopped: {stop_reason}]")

    return full_text

# Usage
messages = [
    {
        "role": "user",
        "content": [{"text": "Write a short story about a robot"}]
    }
]

result = converse_stream(
    bedrock,
    "anthropic.claude-3-sonnet-20240229-v1:0",
    messages,
    {"maxTokens": 1500, "temperature": 0.8}
)

Inference Configuration in Converse API

The inferenceConfig parameter standardizes inference parameters across all models:

inference_config = {
    "maxTokens": 2048,        # Maximum tokens to generate
    "temperature": 0.7,       # Randomness (0.0-1.0)
    "topP": 0.9,             # Nucleus sampling (0.0-1.0)
    "stopSequences": ["\n\n", "END"]  # Stop generation triggers
}

response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=messages,
    inferenceConfig=inference_config
)

Note: Not all parameters are supported by all models. The Converse API handles this gracefully by using what's available.

Complete Converse API Example

import boto3
import json

class ConverseClient:
    """
    A clean wrapper around Bedrock's Converse API
    """
    def __init__(self, region_name='us-east-1'):
        self.client = boto3.client(
            service_name='bedrock-runtime',
            region_name=region_name
        )
        self.conversation_history = []

    def send_message(
        self,
        message: str,
        model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0",
        system_prompt: str = None,
        temperature: float = 0.7,
        max_tokens: int = 2048,
        stream: bool = False
    ):
        """
        Send a message and get a response
        """
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": [{"text": message}]
        })

        # Prepare request parameters
        request_params = {
            "modelId": model_id,
            "messages": self.conversation_history,
            "inferenceConfig": {
                "maxTokens": max_tokens,
                "temperature": temperature,
                "topP": 0.9
            }
        }

        # Add system prompt if provided
        if system_prompt:
            request_params["system"] = [{"text": system_prompt}]

        # Choose streaming or non-streaming
        if stream:
            return self._stream_response(request_params)
        else:
            return self._get_response(request_params)

    def _get_response(self, request_params):
        """Get complete response at once"""
        response = self.client.converse(**request_params)

        # Extract response text
        assistant_message = response['output']['message']
        response_text = assistant_message['content'][0]['text']

        # Add to conversation history
        self.conversation_history.append(assistant_message)

        # Return response with metadata
        return {
            'text': response_text,
            'usage': response['usage'],
            'stop_reason': response['stopReason']
        }

    def _stream_response(self, request_params):
        """Stream response in real-time"""
        response = self.client.converse_stream(**request_params)

        full_text = ""
        usage_info = None
        stop_reason = None

        for event in response['stream']:
            if 'contentBlockDelta' in event:
                delta = event['contentBlockDelta']['delta']
                if 'text' in delta:
                    text = delta['text']
                    full_text += text
                    print(text, end='', flush=True)

            elif 'metadata' in event:
                if 'usage' in event['metadata']:
                    usage_info = event['metadata']['usage']

            elif 'messageStop' in event:
                stop_reason = event['messageStop']['stopReason']

        print()  # New line after streaming

        # Add assistant response to history
        self.conversation_history.append({
            "role": "assistant",
            "content": [{"text": full_text}]
        })

        return {
            'text': full_text,
            'usage': usage_info,
            'stop_reason': stop_reason
        }

    def reset_conversation(self):
        """Clear conversation history"""
        self.conversation_history = []

    def get_history(self):
        """Get current conversation history"""
        return self.conversation_history


# Usage Example
if __name__ == "__main__":
    # Create client
    client = ConverseClient()

    # Set system prompt for the conversation
    system_prompt = "You are a helpful AI assistant specializing in Python programming."

    # First message
    response = client.send_message(
        "What is a decorator in Python?",
        system_prompt=system_prompt,
        temperature=0.5
    )
    print(f"Assistant: {response['text']}")
    print(f"Tokens used: {response['usage']['totalTokens']}\n")

    # Follow-up message (context is maintained)
    response = client.send_message(
        "Can you show me an example?",
        temperature=0.5
    )
    print(f"Assistant: {response['text']}")
    print(f"Tokens used: {response['usage']['totalTokens']}\n")

    # Stream a response
    print("Assistant: ", end='')
    response = client.send_message(
        "Explain how decorators work internally",
        temperature=0.5,
        stream=True
    )
    print(f"\nTokens used: {response['usage']['totalTokens']}")

Converse API vs InvokeModel: When to Use Each

Use Converse API when: - Building conversational applications - You want model-agnostic code - You need multi-turn conversation support - You want cleaner, more maintainable code - You're starting a new project

Use InvokeModel when: - You need model-specific features not in Converse API - You're working with existing code - You need maximum control over request format - You're using advanced model-specific parameters

Model Compatibility

The Converse API works with: - ✅ Anthropic Claude 3 (Opus, Sonnet, Haiku) - ✅ Anthropic Claude 2.x - ✅ Amazon Titan Text models - ✅ Meta Llama 2 and 3 - ✅ Mistral AI models - ✅ Cohere Command models

Check the AWS documentation for the latest compatibility list.

Guardrails: Content Filtering and Safety

AWS Bedrock Guardrails help you implement safeguards for your generative AI applications by filtering harmful content, blocking sensitive information, and enforcing responsible AI practices.

What Are Guardrails?

Guardrails are policies that you define and apply to your Bedrock model interactions to:

Key benefits: - Centralized policy management - Consistent enforcement across all models - Real-time content filtering - Detailed intervention logging - Compliance with regulations

Types of Guardrails

1. Content Filters

Filter harmful content across multiple categories:

content_filters = [
    {
        "type": "HATE",           # Hate speech, discrimination
        "inputStrength": "HIGH",   # HIGH, MEDIUM, LOW, NONE
        "outputStrength": "HIGH"
    },
    {
        "type": "INSULTS",        # Insults, bullying
        "inputStrength": "MEDIUM",
        "outputStrength": "HIGH"
    },
    {
        "type": "SEXUAL",         # Sexual content
        "inputStrength": "HIGH",
        "outputStrength": "HIGH"
    },
    {
        "type": "VIOLENCE",       # Violence, gore
        "inputStrength": "HIGH",
        "outputStrength": "HIGH"
    },
    {
        "type": "MISCONDUCT",     # Criminal activity, illegal content
        "inputStrength": "HIGH",
        "outputStrength": "HIGH"
    },
    {
        "type": "PROMPT_ATTACK",  # Jailbreak attempts, prompt injection
        "inputStrength": "HIGH",
        "outputStrength": "NONE"
    }
]

Strength levels: - HIGH: Strictest filtering, blocks most content in category - MEDIUM: Balanced filtering, blocks obvious violations - LOW: Minimal filtering, only extreme cases - NONE: No filtering for this category

2. Sensitive Information Filters (PII)

Protect personally identifiable information:

pii_filters = [
    {
        "type": "EMAIL",
        "action": "BLOCK"  # or "ANONYMIZE"
    },
    {
        "type": "PHONE",
        "action": "ANONYMIZE"
    },
    {
        "type": "NAME",
        "action": "ANONYMIZE"
    },
    {
        "type": "ADDRESS",
        "action": "BLOCK"
    },
    {
        "type": "SSN",  # Social Security Number
        "action": "BLOCK"
    },
    {
        "type": "CREDIT_DEBIT_CARD_NUMBER",
        "action": "BLOCK"
    },
    {
        "type": "IP_ADDRESS",
        "action": "ANONYMIZE"
    },
    {
        "type": "DRIVER_ID",
        "action": "BLOCK"
    },
    {
        "type": "PASSPORT_NUMBER",
        "action": "BLOCK"
    },
    {
        "type": "USERNAME",
        "action": "ANONYMIZE"
    },
    {
        "type": "PASSWORD",
        "action": "BLOCK"
    }
]

Actions: - BLOCK: Reject the request/response entirely - ANONYMIZE: Replace with placeholder (e.g., [EMAIL], [PHONE])

3. Denied Topics

Restrict conversations to approved topics:

denied_topics = [
    {
        "name": "Financial Advice",
        "definition": "Providing specific investment recommendations, stock tips, or personalized financial planning advice",
        "examples": [
            "Should I invest in Bitcoin?",
            "What stocks should I buy?",
            "How should I allocate my 401k?"
        ],
        "type": "DENY"
    },
    {
        "name": "Medical Diagnosis",
        "definition": "Diagnosing medical conditions or prescribing treatments",
        "examples": [
            "Do I have cancer?",
            "What medication should I take for my headache?",
            "Is this rash serious?"
        ],
        "type": "DENY"
    },
    {
        "name": "Legal Advice",
        "definition": "Providing specific legal counsel or representation",
        "examples": [
            "Should I sue my employer?",
            "How do I file for bankruptcy?",
            "What should I say in court?"
        ],
        "type": "DENY"
    }
]

4. Word Filters (Profanity/Custom)

Block specific words or phrases:

word_filters = [
    {
        "text": "badword1"
    },
    {
        "text": "inappropriate phrase"
    },
    {
        "text": "competitor-name"
    }
]

Creating Guardrails

Guardrails are created using the Bedrock control plane API:

import boto3
import json

# Create Bedrock client for control plane
bedrock_client = boto3.client('bedrock', region_name='us-east-1')

# Define guardrail configuration
guardrail_config = {
    "name": "my-app-guardrail",
    "description": "Guardrail for customer-facing chatbot",
    "topicPolicyConfig": {
        "topicsConfig": [
            {
                "name": "Medical Advice",
                "definition": "Providing medical diagnoses or treatment recommendations",
                "examples": [
                    "Do I have diabetes?",
                    "What medicine should I take?"
                ],
                "type": "DENY"
            },
            {
                "name": "Financial Advice",
                "definition": "Providing specific investment or financial planning advice",
                "examples": [
                    "Should I buy this stock?",
                    "How should I invest my money?"
                ],
                "type": "DENY"
            }
        ]
    },
    "contentPolicyConfig": {
        "filtersConfig": [
            {
                "type": "HATE",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "INSULTS",
                "inputStrength": "MEDIUM",
                "outputStrength": "HIGH"
            },
            {
                "type": "SEXUAL",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "VIOLENCE",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "MISCONDUCT",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "PROMPT_ATTACK",
                "inputStrength": "HIGH",
                "outputStrength": "NONE"
            }
        ]
    },
    "sensitiveInformationPolicyConfig": {
        "piiEntitiesConfig": [
            {
                "type": "EMAIL",
                "action": "ANONYMIZE"
            },
            {
                "type": "PHONE",
                "action": "ANONYMIZE"
            },
            {
                "type": "NAME",
                "action": "ANONYMIZE"
            },
            {
                "type": "SSN",
                "action": "BLOCK"
            },
            {
                "type": "CREDIT_DEBIT_CARD_NUMBER",
                "action": "BLOCK"
            }
        ]
    },
    "wordPolicyConfig": {
        "wordsConfig": [
            {"text": "badword1"},
            {"text": "badword2"}
        ],
        "managedWordListsConfig": [
            {"type": "PROFANITY"}  # Use AWS managed profanity list
        ]
    },
    "blockedInputMessaging": "I cannot process requests containing inappropriate content. Please rephrase your message.",
    "blockedOutputsMessaging": "I cannot provide a response to this request as it violates our content policy."
}

# Create the guardrail
response = bedrock_client.create_guardrail(**guardrail_config)

guardrail_id = response['guardrailId']
guardrail_version = response['version']

print(f"Guardrail created: {guardrail_id}")
print(f"Version: {guardrail_version}")

Applying Guardrails to API Calls

Once created, apply guardrails to your model invocations:

With InvokeModel

import boto3
import json

bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')

request_body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": "Your prompt here"
        }
    ]
}

response = bedrock_runtime.invoke_model(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    body=json.dumps(request_body),
    guardrailIdentifier="your-guardrail-id",  # Add guardrail
    guardrailVersion="1"  # or "DRAFT"
)

response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])

With Converse API

response = bedrock_runtime.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[
        {
            "role": "user",
            "content": [{"text": "Your prompt here"}]
        }
    ],
    inferenceConfig={
        "maxTokens": 1024,
        "temperature": 0.7
    },
    guardrailConfig={
        "guardrailIdentifier": "your-guardrail-id",
        "guardrailVersion": "1",
        "trace": "enabled"  # Enable detailed trace information
    }
)

With Streaming

response = bedrock_runtime.converse_stream(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[
        {
            "role": "user",
            "content": [{"text": "Your prompt here"}]
        }
    ],
    guardrailConfig={
        "guardrailIdentifier": "your-guardrail-id",
        "guardrailVersion": "1",
        "trace": "enabled"
    }
)

for event in response['stream']:
    if 'contentBlockDelta' in event:
        print(event['contentBlockDelta']['delta']['text'], end='', flush=True)

Guardrail Configuration Examples

Example 1: Customer Service Bot

customer_service_guardrail = {
    "name": "customer-service-guardrail",
    "description": "Guardrail for customer service chatbot",
    "contentPolicyConfig": {
        "filtersConfig": [
            {"type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
            {"type": "INSULTS", "inputStrength": "MEDIUM", "outputStrength": "HIGH"},
            {"type": "SEXUAL", "inputStrength": "HIGH", "outputStrength": "HIGH"},
            {"type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH"}
        ]
    },
    "sensitiveInformationPolicyConfig": {
        "piiEntitiesConfig": [
            {"type": "EMAIL", "action": "ANONYMIZE"},
            {"type": "PHONE", "action": "ANONYMIZE"},
            {"type": "CREDIT_DEBIT_CARD_NUMBER", "action": "BLOCK"},
            {"type": "SSN", "action": "BLOCK"}
        ]
    },
    "topicPolicyConfig": {
        "topicsConfig": [
            {
                "name": "Competitor Discussion",
                "definition": "Discussing or comparing with competitor products",
                "examples": ["How do you compare to CompetitorX?"],
                "type": "DENY"
            }
        ]
    },
    "blockedInputMessaging": "I'm here to help with our products and services. Please keep the conversation respectful.",
    "blockedOutputsMessaging": "I apologize, but I cannot provide that information. How else can I assist you?"
}

Example 2: Educational Platform

education_guardrail = {
    "name": "education-platform-guardrail",
    "description": "Guardrail for K-12 educational platform",
    "contentPolicyConfig": {
        "filtersConfig": [
            {"type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
            {"type": "INSULTS", "inputStrength": "HIGH", "outputStrength": "HIGH"},
            {"type": "SEXUAL", "inputStrength": "HIGH", "outputStrength": "HIGH"},
            {"type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
            {"type": "MISCONDUCT", "inputStrength": "HIGH", "outputStrength": "HIGH"}
        ]
    },
    "wordPolicyConfig": {
        "managedWordListsConfig": [
            {"type": "PROFANITY"}
        ]
    },
    "topicPolicyConfig": {
        "topicsConfig": [
            {
                "name": "Inappropriate Content",
                "definition": "Content not suitable for K-12 students",
                "examples": [
                    "How to cheat on tests",
                    "Inappropriate jokes"
                ],
                "type": "DENY"
            }
        ]
    },
    "blockedInputMessaging": "Let's keep our conversation educational and appropriate. How can I help you learn today?",
    "blockedOutputsMessaging": "I can't help with that, but I'd be happy to help you with your studies!"
}

Example 3: Healthcare Information Bot

healthcare_guardrail = {
    "name": "healthcare-info-guardrail",
    "description": "Guardrail for healthcare information (non-diagnostic)",
    "contentPolicyConfig": {
        "filtersConfig": [
            {"type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
            {"type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH"}
        ]
    },
    "sensitiveInformationPolicyConfig": {
        "piiEntitiesConfig": [
            {"type": "NAME", "action": "ANONYMIZE"},
            {"type": "SSN", "action": "BLOCK"},
            {"type": "PHONE", "action": "ANONYMIZE"},
            {"type": "EMAIL", "action": "ANONYMIZE"},
            {"type": "ADDRESS", "action": "ANONYMIZE"}
        ]
    },
    "topicPolicyConfig": {
        "topicsConfig": [
            {
                "name": "Medical Diagnosis",
                "definition": "Attempting to diagnose medical conditions",
                "examples": [
                    "Do I have cancer?",
                    "What disease do I have?",
                    "Am I sick?"
                ],
                "type": "DENY"
            },
            {
                "name": "Prescription Advice",
                "definition": "Recommending specific medications or treatments",
                "examples": [
                    "What medication should I take?",
                    "Should I stop taking my medicine?",
                    "What's the right dosage?"
                ],
                "type": "DENY"
            }
        ]
    },
    "blockedInputMessaging": "I can provide general health information, but I cannot diagnose conditions or prescribe treatments. Please consult a healthcare professional.",
    "blockedOutputsMessaging": "For medical advice specific to your situation, please consult with a qualified healthcare provider."
}

Handling Guardrail Interventions

When a guardrail blocks content, you receive specific information about the intervention:

def invoke_with_guardrail_handling(bedrock_client, model_id, messages, guardrail_id):
    """
    Invoke model with comprehensive guardrail error handling
    """
    try:
        response = bedrock_client.converse(
            modelId=model_id,
            messages=messages,
            guardrailConfig={
                "guardrailIdentifier": guardrail_id,
                "guardrailVersion": "1",
                "trace": "enabled"
            }
        )

        # Check if guardrail intervened
        if 'trace' in response:
            trace = response['trace']
            if 'guardrail' in trace:
                guardrail_trace = trace['guardrail']

                # Input was blocked
                if guardrail_trace.get('inputAssessment'):
                    input_assessment = guardrail_trace['inputAssessment']

                    # Content policy violations
                    if 'contentPolicy' in input_assessment:
                        filters = input_assessment['contentPolicy']['filters']
                        for filter_item in filters:
                            if filter_item['action'] == 'BLOCKED':
                                print(f"Input blocked - {filter_item['type']}: {filter_item['confidence']}")

                    # Topic policy violations
                    if 'topicPolicy' in input_assessment:
                        topics = input_assessment['topicPolicy']['topics']
                        for topic in topics:
                            if topic['action'] == 'BLOCKED':
                                print(f"Input blocked - Topic: {topic['name']}")

                    # PII detected
                    if 'sensitiveInformationPolicy' in input_assessment:
                        pii_entities = input_assessment['sensitiveInformationPolicy']['piiEntities']
                        for entity in pii_entities:
                            print(f"PII detected: {entity['type']} - Action: {entity['action']}")

                # Output was blocked
                if guardrail_trace.get('outputAssessment'):
                    output_assessment = guardrail_trace['outputAssessment']
                    print("Output was blocked by guardrail")

        return response

    except bedrock_client.exceptions.ValidationException as e:
        print(f"Validation error: {e}")
        return None
    except Exception as e:
        print(f"Error: {e}")
        return None

# Usage
response = invoke_with_guardrail_handling(
    bedrock_runtime,
    "anthropic.claude-3-sonnet-20240229-v1:0",
    [{"role": "user", "content": [{"text": "Your message"}]}],
    "your-guardrail-id"
)

Guardrails Best Practices

1. Start with Moderate Settings

# Don't start with all HIGH settings
# ❌ Too restrictive
{"type": "INSULTS", "inputStrength": "HIGH", "outputStrength": "HIGH"}

# ✅ Start balanced, adjust based on testing
{"type": "INSULTS", "inputStrength": "MEDIUM", "outputStrength": "HIGH"}

2. Test Thoroughly

test_cases = [
    # Legitimate use cases
    "How do I reset my password?",
    "What are your business hours?",

    # Edge cases
    "My email is john@example.com, can you help?",
    "I'm frustrated with this service",

    # Should be blocked
    "You're terrible at your job",
    "Tell me how to hack a system"
]

for test in test_cases:
    print(f"\nTesting: {test}")
    response = invoke_with_guardrail(test)
    print(f"Result: {response}")

3. Use Appropriate PII Actions

# For customer service - anonymize to maintain context
{"type": "EMAIL", "action": "ANONYMIZE"}  # Becomes [EMAIL]
{"type": "PHONE", "action": "ANONYMIZE"}  # Becomes [PHONE]

# For sensitive data - block completely
{"type": "SSN", "action": "BLOCK"}
{"type": "CREDIT_DEBIT_CARD_NUMBER", "action": "BLOCK"}
{"type": "PASSWORD", "action": "BLOCK"}

4. Provide Clear Blocked Messages

# ❌ Vague
"blockedInputMessaging": "Request blocked."

# ✅ Helpful and clear
"blockedInputMessaging": "I cannot process requests with inappropriate content. Please rephrase your message respectfully, and I'll be happy to help."

# ✅ Specific to use case
"blockedInputMessaging": "For your privacy and security, I cannot process messages containing sensitive personal information like credit card numbers or social security numbers."

5. Version Your Guardrails

# Create new version for changes
response = bedrock_client.create_guardrail_version(
    guardrailIdentifier="your-guardrail-id",
    description="Added financial advice topic restriction"
)

# Test new version before promoting
guardrail_config = {
    "guardrailIdentifier": "your-guardrail-id",
    "guardrailVersion": "DRAFT",  # Test with DRAFT first
    "trace": "enabled"
}

# After testing, use specific version in production
guardrail_config = {
    "guardrailIdentifier": "your-guardrail-id",
    "guardrailVersion": "2",  # Stable version
    "trace": "enabled"
}

6. Monitor and Iterate

class GuardrailMonitor:
    """Monitor guardrail interventions and adjust policies"""

    def __init__(self):
        self.interventions = []

    def log_intervention(self, intervention_type, details):
        """Log when guardrail blocks content"""
        self.interventions.append({
            "timestamp": datetime.now(),
            "type": intervention_type,
            "details": details
        })

    def get_statistics(self):
        """Analyze intervention patterns"""
        stats = {}
        for intervention in self.interventions:
            type_key = intervention['type']
            stats[type_key] = stats.get(type_key, 0) + 1
        return stats

    def identify_false_positives(self):
        """Flag potential false positives for review"""
        # Implement logic to identify patterns
        # that might indicate overly strict filtering
        pass

# Usage
monitor = GuardrailMonitor()

# In your application
response = invoke_with_guardrail(message)
if response.get('blocked'):
    monitor.log_intervention(
        response['block_reason'],
        {"message": message, "assessment": response['assessment']}
    )

# Periodically review
print(monitor.get_statistics())

Monitoring and Logging Guardrails

Enable CloudWatch logging for guardrail activity:

import boto3

logs_client = boto3.client('logs', region_name='us-east-1')

# Create log group for guardrail monitoring
log_group_name = '/aws/bedrock/guardrails'

try:
    logs_client.create_log_group(logGroupName=log_group_name)
    print(f"Log group created: {log_group_name}")
except logs_client.exceptions.ResourceAlreadyExistsException:
    print(f"Log group already exists: {log_group_name}")

# Query guardrail logs
def query_guardrail_logs(start_time, end_time):
    """Query CloudWatch logs for guardrail interventions"""

    query = """
    fields @timestamp, guardrailId, action, policyType, @message
    | filter action = "BLOCKED"
    | sort @timestamp desc
    | limit 100
    """

    response = logs_client.start_query(
        logGroupName=log_group_name,
        startTime=int(start_time.timestamp()),
        endTime=int(end_time.timestamp()),
        queryString=query
    )

    query_id = response['queryId']

    # Wait for query to complete
    import time
    while True:
        result = logs_client.get_query_results(queryId=query_id)
        if result['status'] == 'Complete':
            return result['results']
        time.sleep(1)

# Usage
from datetime import datetime, timedelta

end_time = datetime.now()
start_time = end_time - timedelta(hours=24)

blocked_requests = query_guardrail_logs(start_time, end_time)
print(f"Blocked requests in last 24 hours: {len(blocked_requests)}")

Key metrics to monitor: - Total interventions by type (content, topic, PII, word) - False positive rate - User experience impact - Most common blocked topics - PII detection frequency

Guardrails provide essential safety and compliance features for production AI applications. Start with moderate settings, test thoroughly, and iterate based on real-world usage patterns.

Inference Parameters by Model

Anthropic Claude Models

request_body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,           # Maximum tokens to generate (required)
    "temperature": 0.7,            # Randomness (0.0-1.0)
    "top_p": 0.9,                  # Nucleus sampling (0.0-1.0)
    "top_k": 250,                  # Top-k sampling
    "stop_sequences": ["\n\n"],   # Stop generation at these sequences
    "messages": [
        {
            "role": "user",
            "content": "Your prompt here"
        }
    ],
    "system": "You are a helpful assistant"  # Optional system prompt
}

Parameter Details: - max_tokens (required): Maximum number of tokens to generate (1-4096 depending on model) - temperature: Controls randomness. Lower = more focused, higher = more creative - top_p: Cumulative probability for nucleus sampling - top_k: Limits vocabulary to top K tokens - stop_sequences: Array of strings that stop generation when encountered

Amazon Titan Text Models

request_body = {
    "inputText": "Your prompt here",
    "textGenerationConfig": {
        "maxTokenCount": 512,      # Max tokens (0-8192)
        "temperature": 0.7,        # Randomness (0.0-1.0)
        "topP": 0.9,              # Nucleus sampling (0.0-1.0)
        "stopSequences": []        # Stop sequences
    }
}

Cohere Command Models

request_body = {
    "prompt": "Your prompt here",
    "max_tokens": 512,             # Max tokens to generate
    "temperature": 0.7,            # Randomness (0.0-5.0)
    "p": 0.9,                      # Nucleus sampling (0.0-1.0)
    "k": 0,                        # Top-k sampling (0-500)
    "stop_sequences": [],          # Stop sequences
    "return_likelihoods": "NONE"   # NONE, GENERATION, ALL
}

AI21 Jurassic Models

request_body = {
    "prompt": "Your prompt here",
    "maxTokens": 512,              # Max tokens (1-8191)
    "temperature": 0.7,            # Randomness (0.0-1.0)
    "topP": 0.9,                   # Nucleus sampling
    "stopSequences": [],           # Stop sequences
    "countPenalty": {
        "scale": 0
    },
    "presencePenalty": {
        "scale": 0
    },
    "frequencyPenalty": {
        "scale": 0
    }
}

Meta Llama Models

request_body = {
    "prompt": "Your prompt here",
    "max_gen_len": 512,            # Max tokens to generate
    "temperature": 0.7,            # Randomness (0.0-1.0)
    "top_p": 0.9                   # Nucleus sampling (0.0-1.0)
}

Understanding Tokens

Before diving into inference parameters, it's essential to understand what tokens are, as they're fundamental to how language models work and how you're billed.

What Are Tokens?

Tokens are the basic units that language models read and generate. Think of them as pieces of words:

Examples:

"Hello, world!" = 4 tokens ["Hello", ",", " world", "!"]
"ChatGPT is amazing" = 5 tokens ["Chat", "G", "PT", " is", " amazing"]
"I'm learning AI" = 5 tokens ["I", "'m", " learning", " AI"]

Why Tokens Matter

  1. Cost: You're charged per token (input + output)
  2. Context Limits: Models have maximum token limits (e.g., 200K tokens for Claude 3)
  3. Performance: More tokens = longer processing time
  4. Quality: Token limits affect how much context you can provide

Token Estimation

As a rough guide: - 1 token ≈ 4 characters in English - 1 token ≈ ¾ of a word - 100 tokens ≈ 75 words - 1,000 tokens ≈ 750 words

Practical Example:

# A typical conversation:
prompt = "Explain quantum computing"  # ~4 tokens
response = "Quantum computing uses quantum mechanics..."  # ~500 tokens
total_tokens = 504  # This is what you're billed for

Managing Token Usage

# Always set max_tokens to control costs
request_body = {
    "max_tokens": 500,  # Limit response length
    "messages": [{"role": "user", "content": prompt}]
}

# Monitor token usage in responses
response_body = json.loads(response['body'].read())
usage = response_body.get('usage', {})
print(f"Input tokens: {usage.get('input_tokens')}")
print(f"Output tokens: {usage.get('output_tokens')}")
print(f"Total cost: ${(usage.get('input_tokens') * 0.003 + usage.get('output_tokens') * 0.015) / 1000}")

Common Inference Parameters Explained

Inference parameters control how the model generates text. Understanding these is key to getting the outputs you want.

Temperature: Controlling Randomness

What it does: Temperature controls the randomness of the model's predictions.

How it works: When a model generates the next token, it calculates probabilities for all possible tokens. Temperature adjusts these probabilities:

Visual Example:

Next token probabilities at different temperatures:

Temperature 0.1 (Focused):
"the" → 85% ████████████████████
"a"   → 10% ███
"an"  →  5% ██

Temperature 1.0 (Creative):
"the" → 40% ████████
"a"   → 30% ██████
"an"  → 20% ████
"my"  → 10% ██

When to use:

# Factual tasks: Use LOW temperature (0.0-0.3)
# - Answering questions
# - Summarization
# - Translation
# - Code generation
request_body = {
    "temperature": 0.1,
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
}
# Output: "The capital of France is Paris." (consistent every time)

# Creative tasks: Use HIGH temperature (0.7-1.0)
# - Story writing
# - Brainstorming
# - Poetry
# - Marketing copy
request_body = {
    "temperature": 0.9,
    "messages": [{"role": "user", "content": "Write a creative tagline for a coffee shop"}]
}
# Output varies: "Where dreams brew daily" / "Sip, savor, smile" / "Your daily dose of magic"

# Balanced tasks: Use MEDIUM temperature (0.5-0.7)
# - Conversational AI
# - General assistance
# - Explanations
request_body = {
    "temperature": 0.6,
    "messages": [{"role": "user", "content": "Explain how photosynthesis works"}]
}

Pro tip: Start with 0.7 and adjust based on results. If outputs are too random, decrease. If too repetitive, increase.

Top P (Nucleus Sampling): Controlling Diversity

What it does: Top P limits the model to consider only the most probable tokens whose cumulative probability adds up to P.

How it works: Instead of considering all possible tokens, the model: 1. Sorts tokens by probability (highest to lowest) 2. Adds probabilities until reaching the P threshold 3. Only samples from this subset

Visual Example:

All tokens sorted by probability:
"the"    → 40%  ████████
"a"      → 25%  █████
"an"     → 15%  ███
"this"   → 10%  ██
"my"     →  5%  █
"your"   →  3%  
"our"    →  2%  
... (hundreds more)

With top_p = 0.8:
✓ "the"  (40%) - included (cumulative: 40%)
✓ "a"    (25%) - included (cumulative: 65%)
✓ "an"   (15%) - included (cumulative: 80%)
✗ "this" (10%) - excluded (would exceed 80%)
✗ All others excluded

Model only chooses from: ["the", "a", "an"]

When to use:

# Focused, consistent output: Use LOW top_p (0.1-0.5)
request_body = {
    "top_p": 0.3,
    "temperature": 0.7,
    "messages": [{"role": "user", "content": "List the steps to bake bread"}]
}
# Sticks to most likely, conventional responses

# Balanced output: Use MEDIUM top_p (0.7-0.9)
request_body = {
    "top_p": 0.85,
    "temperature": 0.7,
    "messages": [{"role": "user", "content": "Describe a sunset"}]
}
# Good mix of common and interesting word choices

# Creative, diverse output: Use HIGH top_p (0.95-1.0)
request_body = {
    "top_p": 0.98,
    "temperature": 0.8,
    "messages": [{"role": "user", "content": "Write a surreal poem"}]
}
# Considers wider vocabulary, more unexpected choices

Relationship with Temperature: - Temperature adjusts the probability distribution - Top P then selects which tokens to consider - Use both together for fine control - Common combination: temperature=0.7, top_p=0.9

Top K: Limiting Vocabulary

What it does: Top K limits the model to only consider the K most likely tokens at each step.

How it works: Simpler than Top P: 1. Sort all tokens by probability 2. Keep only the top K tokens 3. Sample from these K tokens

Visual Example:

With top_k = 3:

All tokens:                    Top 3 only:
"the"    → 40%                "the"  → 40%
"a"      → 25%                "a"    → 25%
"an"     → 15%                "an"   → 15%
"this"   → 10%  ← cut off
"my"     →  5%  ← cut off
... (all others ignored)

When to use:

# Very focused: top_k = 1-10
request_body = {
    "top_k": 5,
    "messages": [{"role": "user", "content": "What is 2+2?"}]
}
# Extremely deterministic, only most likely words

# Balanced: top_k = 40-100
request_body = {
    "top_k": 50,
    "messages": [{"role": "user", "content": "Describe a forest"}]
}
# Good variety while avoiding very unlikely words

# Creative: top_k = 200-500
request_body = {
    "top_k": 250,
    "messages": [{"role": "user", "content": "Invent a new creature"}]
}
# Wider vocabulary, more creative freedom

Top K vs Top P: - Top K: Fixed number of tokens (e.g., always 50 tokens) - Top P: Dynamic number based on probability (could be 3 tokens or 100 tokens) - Top P is generally preferred because it adapts to the situation - Some models use both together

Max Tokens: Controlling Response Length

What it does: Sets the maximum number of tokens the model can generate in its response.

How it works: - Model stops generating when it reaches max_tokens - OR when it naturally completes (hits a stop sequence) - Whichever comes first

Important considerations:

# Too low: Response gets cut off mid-sentence
request_body = {
    "max_tokens": 10,
    "messages": [{"role": "user", "content": "Explain machine learning"}]
}
# Output: "Machine learning is a subset of artificial..." [TRUNCATED]

# Too high: Unnecessary cost and latency
request_body = {
    "max_tokens": 4000,
    "messages": [{"role": "user", "content": "What is 2+2?"}]
}
# Output: "4" (only uses ~1 token, but you reserved 4000)

# Just right: Based on expected response
request_body = {
    "max_tokens": 500,  # ~375 words
    "messages": [{"role": "user", "content": "Summarize this article"}]
}

Setting max_tokens by use case:

# Short answers (50-100 tokens)
"What is the capital of France?"
max_tokens = 50

# Paragraphs (200-500 tokens)
"Explain how neural networks work"
max_tokens = 400

# Essays/Articles (1000-2000 tokens)
"Write a blog post about climate change"
max_tokens = 1500

# Long-form content (2000-4000 tokens)
"Write a detailed tutorial on Python decorators"
max_tokens = 3000

Cost implications:

# Example pricing (Claude 3 Sonnet):
# Input: $0.003 per 1K tokens
# Output: $0.015 per 1K tokens

# Short response (100 tokens)
cost = (100 * 0.015) / 1000 = $0.0015

# Long response (2000 tokens)
cost = (2000 * 0.015) / 1000 = $0.03

# Over a million requests:
# Short: $1,500
# Long: $30,000
# Setting appropriate max_tokens saves real money!

Stop Sequences: Controlling When to Stop

What it does: Tells the model to stop generating when it encounters specific strings.

How it works: - Model generates tokens normally - After each token, checks if output ends with a stop sequence - If match found, stops immediately (stop sequence not included in output)

Common use cases:

# Stop at paragraph breaks
request_body = {
    "stop_sequences": ["\n\n"],
    "messages": [{"role": "user", "content": "Write one paragraph about dogs"}]
}
# Ensures only one paragraph is returned

# Stop at specific markers
request_body = {
    "stop_sequences": ["END", "---", "###"],
    "messages": [{"role": "user", "content": "Generate a code snippet"}]
}

# Stop at conversation turns
request_body = {
    "stop_sequences": ["Human:", "User:", "\nQ:"],
    "messages": [{"role": "user", "content": "Continue this dialogue"}]
}

# Stop at list completion
request_body = {
    "stop_sequences": ["\n\n", "Conclusion", "In summary"],
    "messages": [{"role": "user", "content": "List 5 benefits of exercise"}]
}

Practical example:

# Without stop sequence:
prompt = "List 3 fruits:"
response = "1. Apple\n2. Banana\n3. Orange\n\nFruits are nutritious and..."
# Keeps going!

# With stop sequence:
request_body = {
    "stop_sequences": ["\n\n"],
    "messages": [{"role": "user", "content": "List 3 fruits:"}]
}
response = "1. Apple\n2. Banana\n3. Orange"
# Stops at double newline

Combining Parameters for Optimal Results

The parameter interaction matrix:

# Factual, deterministic responses
factual_config = {
    "temperature": 0.1,      # Very focused
    "top_p": 0.5,           # Limited vocabulary
    "max_tokens": 300,      # Concise
    "stop_sequences": ["\n\n"]
}

# Creative, diverse responses
creative_config = {
    "temperature": 0.9,      # High randomness
    "top_p": 0.95,          # Wide vocabulary
    "max_tokens": 2000,     # Room for creativity
    "stop_sequences": []     # Let it flow
}

# Balanced, conversational responses
balanced_config = {
    "temperature": 0.7,      # Moderate randomness
    "top_p": 0.9,           # Good variety
    "max_tokens": 800,      # Reasonable length
    "stop_sequences": ["\n\nHuman:", "\n\nUser:"]
}

# Code generation
code_config = {
    "temperature": 0.2,      # Precise
    "top_p": 0.8,           # Focused on common patterns
    "max_tokens": 1500,     # Enough for functions
    "stop_sequences": ["```\n\n", "# End"]
}

Error Handling

import botocore

try:
    response = bedrock.invoke_model(
        modelId=model_id,
        body=json.dumps(request_body)
    )
except botocore.exceptions.ClientError as error:
    if error.response['Error']['Code'] == 'ValidationException':
        print("Invalid request parameters")
    elif error.response['Error']['Code'] == 'ResourceNotFoundException':
        print("Model not found or not enabled")
    elif error.response['Error']['Code'] == 'ThrottlingException':
        print("Rate limit exceeded")
    else:
        print(f"Error: {error}")

Best Practices

General Best Practices

  1. Start with default parameters - Use recommended defaults before tuning
  2. Adjust temperature based on use case:
    • Factual tasks: 0.1-0.3
    • Creative writing: 0.7-0.9
    • General purpose: 0.5-0.7
  3. Use stop sequences - Prevent unwanted continuation
  4. Monitor token usage - Control costs by setting appropriate max_tokens
  5. Handle streaming for long responses - Better user experience
  6. Implement retry logic - Handle throttling and transient errors
  7. Cache responses - Reduce API calls for repeated queries

Achieving Stable, Consistent, and Repeatable Responses

When you need deterministic outputs (same input → same output every time), such as for customer support, automated reporting, compliance, or testing, follow these parameter settings:

The Stability Formula

For maximum stability and repeatability, minimize all sources of randomness:

# Maximum stability configuration
stable_config = {
    "temperature": 0.0,        # No randomness - always pick most probable token
    "top_p": 0.1,             # Restrict to top 10% probability mass
    "top_k": 1,               # Only consider the single most probable token
    "max_tokens": 1000
}

# Alternative: Slightly flexible but still very stable
balanced_stable_config = {
    "temperature": 0.1,        # Minimal randomness
    "top_p": 0.2,             # Small probability window
    "top_k": 5,               # Top 5 tokens only
    "max_tokens": 1000
}

Parameter-by-Parameter Guide for Stability

1. Temperature: Set to 0.0 - 0.3

Why: Temperature controls randomness. Lower = more deterministic.

# Maximum determinism
{"temperature": 0.0}  # Always picks most probable token (greedy decoding)

# Very stable with tiny variation
{"temperature": 0.1}  # 99% deterministic, allows minimal variation

# Stable but slightly flexible
{"temperature": 0.3}  # Good for factual responses with some natural variation

Effect: - 0.0: Identical output every time (100% repeatable) - 0.1: Nearly identical with minor word choice variations - 0.3: Consistent meaning but may vary phrasing slightly

2. Top_p: Set to 0.1 - 0.3 or Disable

Why: Restricts token selection to high-probability options only.

# Very restrictive (most stable)
{"top_p": 0.1}  # Only top 10% probability mass

# Balanced stability
{"top_p": 0.2}  # Top 20% probability mass

# For maximum stability, combine with low temperature
{"temperature": 0.1, "top_p": 0.1}

Note: Some models may not support disabling top_p entirely. Use the lowest value that works.

3. Top_k: Set to 1 - 10

Why: Limits vocabulary to the most probable tokens.

# Maximum determinism (greedy decoding)
{"top_k": 1}  # Always pick the single most probable token

# Very stable with minimal variation
{"top_k": 3}  # Choose from top 3 tokens only

# Stable but allows some natural variation
{"top_k": 10}  # Top 10 tokens - still quite deterministic

Effect: - top_k = 1: Completely deterministic (same as temperature = 0) - top_k = 3-5: Highly consistent with minor variations - top_k = 10: Stable but more natural-sounding

4. Stop Sequences: Use Consistently

Why: Ensures output ends at the same point every time.

# Define clear stop points
{
    "stop_sequences": ["\n\n", "END", "---"],
    "temperature": 0.1
}
5. Avoid Penalties (If Available)

Why: Penalties introduce variability to avoid repetition.

# For stability, disable penalties
{
    "frequency_penalty": 0,   # Don't penalize repeated tokens
    "presence_penalty": 0     # Don't encourage new topics
}

Note: Bedrock models may not expose these parameters directly, but be aware if using other platforms.

Complete Stability Configurations by Use Case

Use Case 1: Customer Support Bot (Maximum Consistency)
customer_support_config = {
    "temperature": 0.1,
    "top_p": 0.2,
    "top_k": 5,
    "max_tokens": 500,
    "stop_sequences": ["\n\nCustomer:", "\n\nAgent:"]
}

# Example usage
response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[
        {
            "role": "user",
            "content": [{"text": "How do I reset my password?"}]
        }
    ],
    system=[
        {"text": "You are a customer support agent. Provide clear, step-by-step instructions."}
    ],
    inferenceConfig=customer_support_config
)

# Result: Same question will always get the same answer
Use Case 2: Automated Reporting (100% Repeatability)
reporting_config = {
    "temperature": 0.0,  # Zero randomness
    "top_p": 0.1,
    "top_k": 1,          # Greedy decoding
    "max_tokens": 2000
}

# Example: Generate monthly report
response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[
        {
            "role": "user",
            "content": [{"text": f"Generate monthly sales report for: {sales_data}"}]
        }
    ],
    system=[
        {"text": """Generate a sales report with this exact structure:
1. Executive Summary
2. Key Metrics (bullet points)
3. Top Performers
4. Areas for Improvement
5. Recommendations

Use professional, formal language."""}
    ],
    inferenceConfig=reporting_config
)

# Result: Identical data will produce identical reports
Use Case 3: Compliance/Auditing (Reproducible Outputs)
compliance_config = {
    "temperature": 0.0,
    "top_p": 0.1,
    "top_k": 1,
    "max_tokens": 1500,
    "stop_sequences": ["END OF ANALYSIS"]
}

# Example: Compliance check
response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[
        {
            "role": "user",
            "content": [{"text": f"Analyze this transaction for compliance: {transaction}"}]
        }
    ],
    system=[
        {"text": """You are a compliance analyzer. For each transaction, provide:
1. Compliance Status: PASS/FAIL/REVIEW
2. Regulations Checked: [list]
3. Findings: [detailed list]
4. Risk Level: LOW/MEDIUM/HIGH
5. Recommended Action: [specific action]

Be consistent and deterministic in your analysis."""}
    ],
    inferenceConfig=compliance_config
)

# Result: Same transaction always gets same analysis
Use Case 4: API Response Generation (Consistent JSON)
api_response_config = {
    "temperature": 0.0,
    "top_p": 0.1,
    "top_k": 1,
    "max_tokens": 1000,
    "stop_sequences": ["\n```"]
}

# Example: Generate API response
response = bedrock.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[
        {
            "role": "user",
            "content": [{"text": f"Convert to JSON: {user_data}"}]
        }
    ],
    system=[
        {"text": """Convert input to valid JSON with this exact structure:
{
  "status": "success",
  "data": {...},
  "timestamp": "ISO-8601"
}

Output only valid JSON, no explanations."""}
    ],
    inferenceConfig=api_response_config
)

# Result: Same input always produces same JSON structure
Use Case 5: Testing and QA (Reproducible Test Cases)
testing_config = {
    "temperature": 0.0,
    "top_p": 0.1,
    "top_k": 1,
    "max_tokens": 800
}

# Example: Generate test cases
def generate_test_case(feature_description):
    """Generate consistent test cases for QA"""
    response = bedrock.converse(
        modelId="anthropic.claude-3-sonnet-20240229-v1:0",
        messages=[
            {
                "role": "user",
                "content": [{"text": f"Generate test cases for: {feature_description}"}]
            }
        ],
        system=[
            {"text": """Generate test cases in this format:
Test Case ID: TC-XXX
Description: [clear description]
Preconditions: [list]
Steps: [numbered steps]
Expected Result: [specific outcome]
Priority: HIGH/MEDIUM/LOW

Be consistent and thorough."""}
        ],
        inferenceConfig=testing_config
    )
    return response

# Result: Same feature description always generates same test cases

Practical Implementation: Stability Helper Class

class StableBedrockClient:
    """
    Bedrock client optimized for stable, repeatable responses
    """

    STABILITY_PRESETS = {
        "maximum": {
            "temperature": 0.0,
            "top_p": 0.1,
            "top_k": 1,
            "description": "100% deterministic - identical outputs"
        },
        "high": {
            "temperature": 0.1,
            "top_p": 0.2,
            "top_k": 5,
            "description": "Very stable with minimal variation"
        },
        "moderate": {
            "temperature": 0.3,
            "top_p": 0.3,
            "top_k": 10,
            "description": "Stable but allows natural phrasing"
        }
    }

    def __init__(self, region_name='us-east-1'):
        self.client = boto3.client('bedrock-runtime', region_name=region_name)

    def invoke_stable(
        self,
        prompt: str,
        model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0",
        stability_level: str = "high",
        system_prompt: str = None,
        max_tokens: int = 1000
    ):
        """
        Invoke model with stability-optimized parameters

        Args:
            prompt: User prompt
            model_id: Bedrock model ID
            stability_level: "maximum", "high", or "moderate"
            system_prompt: Optional system prompt
            max_tokens: Maximum tokens to generate

        Returns:
            Stable, repeatable response
        """
        # Get stability preset
        config = self.STABILITY_PRESETS.get(stability_level, self.STABILITY_PRESETS["high"])

        # Build inference config
        inference_config = {
            "temperature": config["temperature"],
            "topP": config["top_p"],
            "maxTokens": max_tokens
        }

        # Build request
        request_params = {
            "modelId": model_id,
            "messages": [
                {
                    "role": "user",
                    "content": [{"text": prompt}]
                }
            ],
            "inferenceConfig": inference_config
        }

        if system_prompt:
            request_params["system"] = [{"text": system_prompt}]

        # Invoke
        response = self.client.converse(**request_params)

        return {
            "text": response['output']['message']['content'][0]['text'],
            "usage": response['usage'],
            "config_used": config
        }

    def test_stability(self, prompt: str, iterations: int = 5):
        """
        Test stability by running same prompt multiple times

        Returns:
            Dictionary with results and consistency analysis
        """
        results = []

        for i in range(iterations):
            response = self.invoke_stable(prompt, stability_level="maximum")
            results.append(response['text'])

        # Check if all responses are identical
        all_identical = all(r == results[0] for r in results)
        unique_responses = len(set(results))

        return {
            "all_identical": all_identical,
            "unique_responses": unique_responses,
            "total_iterations": iterations,
            "consistency_rate": f"{((iterations - unique_responses + 1) / iterations) * 100:.1f}%",
            "responses": results
        }

# Usage Examples
client = StableBedrockClient()

# Example 1: Maximum stability
response = client.invoke_stable(
    prompt="What is the capital of France?",
    stability_level="maximum"
)
print(response['text'])
# Output: "The capital of France is Paris." (always identical)

# Example 2: Test stability
test_results = client.test_stability(
    prompt="Explain what machine learning is in one sentence.",
    iterations=10
)
print(f"Consistency: {test_results['consistency_rate']}")
print(f"Unique responses: {test_results['unique_responses']}/10")

# Example 3: Customer support with high stability
support_response = client.invoke_stable(
    prompt="How do I reset my password?",
    stability_level="high",
    system_prompt="You are a helpful customer support agent. Provide clear, step-by-step instructions."
)
print(support_response['text'])

Trade-offs and Considerations

Pros of Stable Configuration

Consistency: Same input always produces same output ✅ Predictability: Easier to test and validate ✅ Reliability: Users get consistent information ✅ Compliance: Reproducible for auditing ✅ Debugging: Easier to identify issues

Cons of Stable Configuration

Less Natural: Responses may sound robotic or repetitive ❌ Reduced Creativity: Cannot generate diverse alternatives ❌ Ambiguity Issues: May struggle with open-ended questions ❌ Repetitive Phrasing: Same phrases used repeatedly ❌ Less Engaging: Conversations may feel mechanical

When to Use Stable vs. Creative Configurations

# Use STABLE configuration for:
stable_use_cases = [
    "Customer support FAQs",
    "Automated reporting",
    "Compliance analysis",
    "API response generation",
    "Testing and QA",
    "Data extraction",
    "Classification tasks",
    "Fact-based Q&A"
]

# Use CREATIVE configuration for:
creative_use_cases = [
    "Content writing",
    "Brainstorming",
    "Story generation",
    "Marketing copy",
    "Creative problem solving",
    "Conversational chat",
    "Idea generation",
    "Poetry or artistic content"
]

# Use BALANCED configuration for:
balanced_use_cases = [
    "General assistance",
    "Educational tutoring",
    "Code explanation",
    "Technical documentation",
    "Email drafting",
    "Meeting summaries"
]

Verification: Testing Stability

def verify_stability(bedrock_client, prompt, config, iterations=10):
    """
    Verify that a configuration produces stable outputs
    """
    responses = []

    for i in range(iterations):
        response = bedrock_client.converse(
            modelId="anthropic.claude-3-sonnet-20240229-v1:0",
            messages=[{"role": "user", "content": [{"text": prompt}]}],
            inferenceConfig=config
        )
        text = response['output']['message']['content'][0]['text']
        responses.append(text)

    # Calculate metrics
    unique_responses = len(set(responses))
    consistency_rate = ((iterations - unique_responses + 1) / iterations) * 100

    print(f"Stability Test Results:")
    print(f"  Total runs: {iterations}")
    print(f"  Unique responses: {unique_responses}")
    print(f"  Consistency rate: {consistency_rate:.1f}%")
    print(f"  Configuration: {config}")

    if unique_responses == 1:
        print("  ✅ Perfect stability - all responses identical")
    elif unique_responses <= 3:
        print("  ✅ High stability - minimal variation")
    else:
        print("  ⚠️  Low stability - consider lowering temperature/top_p/top_k")

    return {
        "unique_responses": unique_responses,
        "consistency_rate": consistency_rate,
        "responses": responses
    }

# Test different configurations
print("Testing Maximum Stability:")
verify_stability(
    bedrock,
    "What is 2+2?",
    {"temperature": 0.0, "topP": 0.1, "maxTokens": 50},
    iterations=10
)

print("\nTesting High Stability:")
verify_stability(
    bedrock,
    "What is 2+2?",
    {"temperature": 0.1, "topP": 0.2, "maxTokens": 50},
    iterations=10
)

Key Takeaway

For stable, consistent, and repeatable responses: - Temperature: 0.0 - 0.3 (lower = more stable) - Topp: 0.1 - 0.3 (lower = more stable) - Topk: 1 - 10 (lower = more stable) - Stop sequences: Use consistently - System prompts: Be specific and structured

Start with temperature=0.1, top_p=0.2, top_k=5 and adjust based on your stability requirements and output quality needs.

Cost Optimization

Additional Resources

Example: Complete Implementation

import boto3
import json
from typing import Dict, Any

class BedrockClient:
    def __init__(self, region_name: str = 'us-east-1'):
        self.client = boto3.client(
            service_name='bedrock-runtime',
            region_name=region_name
        )

    def invoke_claude(
        self,
        prompt: str,
        max_tokens: int = 1024,
        temperature: float = 0.7,
        system_prompt: str = None
    ) -> str:
        """Invoke Claude model with specified parameters"""

        request_body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "temperature": temperature,
            "messages": [
                {
                    "role": "user",
                    "content": prompt
                }
            ]
        }

        if system_prompt:
            request_body["system"] = system_prompt

        try:
            response = self.client.invoke_model(
                modelId="anthropic.claude-3-sonnet-20240229-v1:0",
                body=json.dumps(request_body)
            )

            response_body = json.loads(response['body'].read())
            return response_body['content'][0]['text']

        except Exception as e:
            print(f"Error invoking model: {e}")
            raise

# Usage
bedrock_client = BedrockClient()
result = bedrock_client.invoke_claude(
    prompt="Explain quantum computing in simple terms",
    temperature=0.5
)
print(result)

Conclusion

AWS Bedrock provides a powerful, unified interface to multiple foundation models. Understanding inference parameters allows you to fine-tune model behavior for your specific use case, balancing creativity, accuracy, and cost.